\title{Generative Adversarial Networks}

\subsection{Generative Adversarial Networks}

Generative adversarial networks (GANs) are a powerful approach for
probabilistic modeling \citep{goodfellow2014generative,goodfellow2016nips}.
They posit a deep generative model and they enable fast and accurate
inferences.

We demonstrate with an example in Edward.
An interactive version with Jupyter notebook is available
\href{http://nbviewer.jupyter.org/github/blei-lab/edward/blob/master/notebooks/gan.ipynb}{here}.

\begin{lstlisting}[language=Python]
M = 128  # batch size during training
d = 100  # latent dimension

data_dir = "/tmp/data"
out_dir = "/tmp/out"
\end{lstlisting}

\subsubsection{Data}

We use training data from MNIST, which consists of 55,000 $28\times
28$ pixel images \citep{lecun1998gradient}. Each image is represented
as a flattened vector of 784 elements, and each element is a pixel
intensity between 0 and 1.

\includegraphics[width=450px]{/images/gan-fig0.png}

The goal is to build and infer a model that can generate high quality
images of handwritten digits.

During training we will feed batches of MNIST digits. We instantiate a
TensorFlow placeholder with a fixed batch size of $M$ images.

We also define a helper function to select the next batch of data
points from the full set of examples. It keeps track of the current
batch index and returns the next batch using the function \texttt{next()}.
We will generate batches from \texttt{x\_train\_generator} during inference.

\begin{lstlisting}[language=Python]
from observations import mnist

def generator(array, batch_size):
  """Generate batch with respect to array's first axis."""
  start = 0  # pointer to where we are in iteration
  while True:
    stop = start + batch_size
    diff = stop - array.shape[0]
    if diff <= 0:
      batch = array[start:stop]
      start += batch_size
    else:
      batch = np.concatenate((array[start:], array[:diff]))
      start = diff
    batch = batch.astype(np.float32) / 255.0  # normalize pixel intensities
    batch = np.random.binomial(1, batch)  # binarize images
    yield batch

(x_train, _), (x_test, _) = mnist(data_dir)
x_train_generator = generator(x_train, M)
x_ph = tf.placeholder(tf.float32, [M, 784])
\end{lstlisting}


\subsubsection{Model}

GANs posit generative models using an implicit mechanism. Given some
random noise, the data is assumed to be generated by a deterministic
function of that noise.

Formally, the generative process is
\begin{align*}
\mathbf{\epsilon} &\sim p(\mathbf{\epsilon}), \\
\mathbf{x} &= G(\mathbf{\epsilon}; \theta),
\end{align*}
where $G(\cdot; \theta)$ is a neural network that takes the samples
$\mathbf{\epsilon}$ as input. The distribution
$p(\mathbf{\epsilon})$ is interpreted as random noise injected to
produce stochasticity in a physical system; it is typically a fixed
uniform or normal distribution with some latent dimensionality.

In Edward, we build the model as follows, using \texttt{tf.layers} to
specify the neural network. It defines a 2-layer fully connected neural
network and outputs a vector of length $28\times28$ with values in
$[0,1]$.

\begin{lstlisting}[language=Python]
from edward.models import Uniform

def generative_network(eps):
  net = tf.layers.dense(eps, 128, activation=tf.nn.relu)
  net = tf.layers.dense(net, 784, activation=tf.sigmoid)
  return net

with tf.variable_scope("Gen"):
  eps = Uniform(tf.zeros([M, d]) - 1.0, tf.ones([M, d]))
  x = generative_network(eps)
\end{lstlisting}

We aim to estimate parameters of the generative network such
that the model best captures the data. (Note in GANs, we are
interested only in parameter estimation and not inference about any
latent variables.)

Unfortunately, probability models described above do not admit a tractable
likelihood. This poses a problem for most inference algorithms, as
they usually require taking the model's density.  Thus we are
motivated to use ``likelihood-free'' algorithms
\citep{marin2012approximate}, a class of methods which assume one
can only sample from the model.

\subsubsection{Inference}

A key idea in likelihood-free methods is to learn by
comparison (e.g., \citet{rubin1984bayesianly,gretton2012kernel}): by
analyzing the discrepancy between samples from the model and samples
from the true data distribution, we have information on where the
model can be improved in order to generate better samples.

In GANs, a neural network $D(\cdot;\phi)$ makes this comparison,
known as the discriminator.
$D(\cdot;\phi)$ takes data $\mathbf{x}$ as input (either
generations from the model or data points from the data set), and it
calculates the probability that $\mathbf{x}$ came from the true data.

In Edward, we use the following discriminative network. It is simply a
feedforward network with one ReLU hidden layer. It returns the
probability in the logit (unconstrained) scale.

\begin{lstlisting}[language=Python]
def discriminative_network(x):
  """Outputs probability in logits."""
  net = tf.layers.dense(x, 128, activation=tf.nn.relu)
  net = tf.layers.dense(net, 1, activation=None)
  return net
\end{lstlisting}

Let $p^*(\mathbf{x})$ represent the true data distribution.
The optimization problem used in GANs is

\begin{equation*}
\min_\theta \max_\phi~
\mathbb{E}_{p^*(\mathbf{x})} [ \log D(\mathbf{x}; \phi) ]
+ \mathbb{E}_{p(\mathbf{x}; \theta)} [ \log (1 - D(\mathbf{x}; \phi)) ].
\end{equation*}

This optimization problem is bilevel: it requires a minima solution
with respect to generative parameters and a maxima solution with
respect to discriminative parameters.
In practice, the algorithm proceeds by iterating gradient updates on
each. An additional heuristic also modifies the objective function for the
generative model in order to avoid saturation of gradients
\citep{goodfellow2014on}.

Many sources of intuition exist behind GAN-style training. One, which
is the original motivation, is based on idea that the two neural
networks are playing a game. The discriminator tries to best
distinguish samples away from the generator. The generator tries
to produce samples that are indistinguishable by the discriminator.
The goal of training is to reach a Nash equilibrium.

Another source is the idea of casting unsupervised learning as
supervised learning
\citep{gutmann2010noise,gutmann2014statistical}.
This allows one to leverage the power of classification—a problem that
in recent years is (relatively speaking) very easy.

A third comes from classical statistics, where the discriminator is
interpreted as a proxy of the density ratio between the true data
distribution and the model
\citep{sugiyama2012density,mohamed2016learning}. By augmenting an
original problem that may require the model's density with a
discriminator (such as maximum likelihood), one can recover the
original problem when the discriminator is optimal. Furthermore, this
approximation is very fast, and it justifies GANs from the perspective
of approximate inference.

In Edward, the GAN algorithm (\texttt{GANInference}) simply takes the
implicit density model on \texttt{x} as input, binded to its
realizations \texttt{x\_ph}. In addition, a parameterized function
\texttt{discriminator} is provided to distinguish their
samples.

\begin{lstlisting}[language=Python]
inference = ed.GANInference(
    data={x: x_ph}, discriminator=discriminative_network)
\end{lstlisting}

We'll use ADAM as optimizers for both the generator and discriminator.
We'll run the algorithm for 15,000 iterations and print progress every
1,000 iterations.

\begin{lstlisting}[language=Python]
optimizer = tf.train.AdamOptimizer()
optimizer_d = tf.train.AdamOptimizer()

inference.initialize(
    optimizer=optimizer, optimizer_d=optimizer_d,
    n_iter=15000, n_print=1000)
\end{lstlisting}

We now form the main loop which trains the GAN. At each iteration, it
takes a minibatch and updates the parameters according to the
algorithm. At every 1000 iterations, it will print progress and also
saves a figure of generated samples from the model.

\begin{lstlisting}[language=Python]
sess = ed.get_session()
tf.global_variables_initializer().run()

idx = np.random.randint(M, size=16)
i = 0
for t in range(inference.n_iter):
  if t % inference.n_print == 0:
    samples = sess.run(x)
    samples = samples[idx, ]

    fig = plot(samples)
    plt.savefig(os.path.join(out_dir, '{}.png').format(
        str(i).zfill(3)), bbox_inches='tight')
    plt.close(fig)
    i += 1

  x_batch = next(x_train_generator)
  info_dict = inference.update(feed_dict={x_ph: x_batch})
  inference.print_progress(info_dict)
\end{lstlisting}

Examining convergence of the GAN objective can be meaningless in
practice. The algorithm is usually run until some other criterion is
satisfied, such as if the samples look visually okay, or if the GAN
can capture meaningful parts of the data.

\subsubsection{Criticism}

Evaluation of GANs remains an open problem---both in criticizing their
fit to data and in assessing convergence.
Recent advances have considered alternative objectives and
heuristics to stabilize training (see also Soumith Chintala's
\href{https://github.com/soumith/ganhacks}{GAN hacks repo}).

As one approach to criticize the model, we simply look at generated
images during training. Below we show generations after 14,000
iterations (that is, 14,000 gradient updates of both the generator and
the discriminator).

\includegraphics[width=500px]{/images/gan-fig1.png}

The images are meaningful albeit a little blurry. Suggestions for
further improvements would be to tune the hyperparameters in the
optimization, to improve the capacity of the discriminative and
generative networks, and to leverage more prior information (such as
convolutional architectures).

\subsubsection{References}\label{references}
